Using this text tutorial is not recommended its better to open KnowledgeMiner and go to the Help (?) menu and use the interactive apple guide version of the tutorial within KnowledgeMiner. The older version of the turtorial is here only for your reference.
KnowledgeMiner Tutorial
The information here is just the text of the AppleGuide version
INTRODUCTION
INTRO1
Welcome to KnowledgeMiner
This tutorial was designed to help you to quickly open up new possibilities in your daily work using the extraordinary modeling features of KnowledgeMiner. It is an Artificial Intelligence tool which is basically build on the cybernetic principals of self-organization: learning a completely unknown relationship between an output and an input of any given system in an evolutionary way from a very simple organization to an optimal complex one. The main advantages of this inductive approach are:
• only minimal, uncertain a priori information
about the system is required,
• very fast and effective learning process, also for
ordinary PC's,
• modeling on very short and noisy data samples,
• output of an optimal complex analytical model,
• transparent explanation component.
These advantages over statistical methods as well as over neural networks makes KnowledgeMiner applicable to a wide range of real world problems and to one of the most effective modeling and prediction tools available.
INTRO2
In this tutorial we will work on the COD concentration document located in the KnowledgeMiner folder. This document contains monthly observational data from the Osaka Bay to solve a water pollution problem. The COD concentration variable (COD - Chemical Oxygen Demand) will be used as an indicator of water pollution. So, we want to model and predict this variable (and other) to get an information on how the water quality will change in the next five month.
INTRO3
To give you a better understanding on model self-organization, you will see in the following example animation how an optimal complex model will grow in an evolutionary process of combination and selection (knowledge extraction) out of a completely unknown relationship between output variable and chosen input variables (black box) .
THE DATA BASIS
DB1
The data basis is the main source for model building in KnowledgeMiner. Therefore, each KnowledgeMiner document consists of one data basis located in the 'Data: ' window. Only one document can be opened at a time.
DB2
Do This
<Format> "Body"
If you have not already done, please open the document 'COD concentration' located in the KnowledgeMiner folder. If the Open menu item is dimmed, close the currently open document.
If you need instructions on closing a document, click Huh? below.
DB3
After opening a document the data basis is always visible in the 'Data:' window. The picture below describes the general construction of the data basis:
THE INFORMATION BASIS
Intro IB
The information basis contains the complete data set you choose to serve as the information source for model self-organization. Defining the information basis is the most important task you have to do before creating a model and should be done as most carefully as possible. Generally, you have to answer yourself the question 'By which input variables my output variable could be affected reasonably?'. Note, that selecting the information basis is only a pre-definition of feasabilities that any dependence between input variables and output variable might exist. It defines a set of variables, only a subset of really relevant variables will be selected from during model self-organization (knowledge extraction).
In this section you will learn, how you define the information basis for modeling easily.
If you want additional information about inductive learning modeling, click Huh? below.
IB1
Do This
Choose menu item Selection Mask On in the Table menu.
This option will help you building the information basis by selecting corresponding cells.
IB2
Do This
To select an output variable, click in the first row of that variable (generally, the given name of the variable) you want to be modeled.
Here we want to model the COD conc. variable.
A "Y" in the head of the column indicates the selected variable as output variable.
For a picture about output variable selection, click Huh? below.
IB3
Do This
To select any combination of input variables and time lags hold down the Shift- or Command-key and click in a cell in the table. Generally, the column specifies the input variable and the row the time lag of the variable corresponding to the number shown on the left side of the table. At least two variables have to be selected.
For instance, if you select two cells in the third column and the first and second row they would be interpreted as X2(t) and X2(t-1). In this case you would have specified the following information basis:
Y(t)=f(X2(t), X2(t-1)).
For a picture about input variables selection, click Huh? below.
IB4
You have learned, how you choose output and input variables by clicking in the corresponding cells in the table. Now, you are prepared to create a model. This is described in the next section.
TIME SERIES MODELS
Intro
In this section you will learn how you create a time series model and how you make predictions on it. You should have completed the previous sections related to data basis and defining the information basis for modeling.
TS1
Do This
Choose menu item Selection Mask On in the Table menu. If this menu item is not highlighted and replaced by the name Selection Mask Off, you are already working in this mode.
This option will help you to define the information basis by selecting corresponding cells.
TS2
Do This
At first you have to select the output variable. Click in the first row of column X1 (respectively COD conc.) to mark it as our output variable.
A "Y" in the head of the column indicates this variable as selected output variable.
For a picture on output variable selection, click Huh? below.
TS3
Do This
We may have decided in this case to consider all lagged samples Y(t-n), n=1, 2, ..., 5, Try to select the corresponding cells in the table.
For a picture on correct selection, click Huh? below.
TS4
Now, after selecting output and input variables, we are ready to create the time series model which is described in the next topic.
TS5
Do This
Choose the menu item Create Time Series Model... from the Modeling menu.
TS6
Here you can get an overview on the considered output and input variables. Non-valid variable selections made previously in the table will be excluded automatically from the information basis.
TS7
Do This
Specify the number of data used as a learning and checking set for model synthetization beginning from the top of the table. The minimum data length is 6.
Here we want to choose a length of 40.
TS8
Do This
If you have not specified the time lags in the table you can do it in this field. That is, if the field is not dimmed and enabled.
TS9
Do This
Select wether the model should be linear or nonlinear one. Since each partial model (Active Neuron) as well as the whole network structure will be synthesized and optimized automatically, choosing a nonlinear model not necessarily finally leads to a nonlinear network model. If the detected best model is a linear one, the algorithm will present this linear model as the most accurate.
TS10
Here, the expected memory requirements of the modeling process for the first four layers are displayed. After changing the data length or lag time in the dialog, you can check the memory requirements again by clicking on the "Memory" string.
TS11
You have learned, how you setup a few parameter in the dialog window. Now, you can start the modeling process by clicking the Modeling button in the dialog window.
TIME SERIES MODELS - PREDICTIONS
TS Pred1
Now you have build your first time series model.
Time series models (or auto regressive models) can be identified in the Models menu by a '-AR', added to the name of the model. As you may have seen, immediately after finishing the modeling process the 'Graph: ' window appears to provide a visualization of the power of the model in comparision to the original process. You will also may have noticed that the modeling process stops itself without any pre-definition when it should has to stop. That is, when the algorithm has synthesized an optimal complex model and it detects, it would begin to overfit the design data (learning and checking data set). This feature is one important advantage of KnowledgeMiner over deductive methods like statistical regression or Neural Networks.
TS Pred2
Do This
Not only a graphic representation of the model is immediately available: KnowledgeMiner also presents an analytical description of each model in its tree-like structure.
Choose the menu item Model Equation in the Window menu to have a look at the model equation. Additionally, you will find there reported the chosen heuristics for that model.
TS Pred3
Do This
In our example, we have the true data of the forecast horizon available which we want to use for model performance validation. These validation data need to be stored in the same column and below the design data. Therefore, you have to locate the corresponding first date (row) in the table.
Scroll the 'Data:' window down until rows 40-55 are visible.
TS Pred4
Do This
Click now in row 48 of column X1 (our output variable). This cell contains the first true date of the forecast horizon.
For a picture, click Huh? below.
TS Pred5
Do This
Choose the menu item Original Data Begin in This Row in the Table menu.
TS Pred6
Do This
To make a status quo prediction it is necessary that all cells of the forecast area in the table are empty. Otherwise, you would see in the 'Graph: ' window predicted values calculated on the alredy existing data (which would be a one-step What-If prediction).
Clear all red colored cells of the column X1 beginning at row 41 to row 45.
TS Pred7
Do This
Choose the menu item Predict Time Series... in the Modeling menu.
TS Pred8
Do This
Type a value for the forecast horizon you want to use. Here, please type a 5.
TS Pred9
Do This
We have true data of the forecast horizon available which we want to use for model performance validation.
To feature this, click in this checkbox. Note, that if you check this item and there are no data or only less than the forecast horizon available, it will not affect the prediction itself.
TS Pred10
Do This
Click on the Prediction button to predict the variable. Predicted values are displayed in red color.
Input-Output Models
Static Input-Output Models
Intro
In this section you will learn how you create input-output models and how you make predictions on them. You should have completed the previous sections.
We want to create a static input-output model of the COD concentration variable. Static models can be used to solve analysis, classification or diagnosis problems. They are independent from time and therefore have no time lags.
In our example, we want consider as inputs the variables X2 to X6 .
SIOM1
Do This
At first, you have to select the output variable.
Click in the first row of column X1 (respectively COD conc.) to mark it as our output variable Y.
A "Y" in the head of the column indicates this variable as selected output variable.
For a picture on output variable selection, click Huh? below.
SIOM2
Do This
We have decided to consider all unlagged samples of X2 up to X6.
Try to select the corresponding cells in the table.
For a picture on correct selection, click Huh? below.
SIOM3
Do This
Choose the menu item Create Input-Output-
Model... from the Modeling menu.
SIOM4
Do This
Specify the number of data used as a learning and checking set for model synthetization beginning from the top of the table. The minimum data length is 6.
Here we want to choose a length of 40.
SIOM5
Do This
If you have not specified the time lags in the table, you can do it here. That is, if the field is not dimmed and enabled. Since we want to create a static model, type a zero in this field.
SIOM6
Do This
This checkbox is only of interest, if you want to build a system of equations. You will learn how to create a system of equations later.
For now, make sure that this checkbox is not checked.
SIOM7
You have learned how you can setup a few parameter in the dialog window. Now, you can start the modeling process by clicking the Modeling button in the dialog window.
Dynamic Input-Output Models
Intro
Now, we want to create a dynamic input-output model of the COD conc. variable. Dynamic models are used to model and predict the dynamic behavior of a time process, the evolution of a variable over time.
In our example, we want consider as input variables for modeling the unlagged samples Xm(t) (the static part) and the first lagged samples Xm(t-1) (the dynamic part) of the variables X2 to X6 (m=2, 3, ..., 6).
DIOM1
Do This
At first, you have to select the output variable.
Click in the first row of column X1 (respectively COD conc.) to mark it as our output variable Y.
A "Y" in the head of the column indicates this variable as selected output variable.
For a picture on output variable selection, see at SIOM1.
DIOM2
Do This
Choose menu item Selection Mask On in the Table menu. If this menu item is not highlighted and replaced by the name Selection Mask Off, you are already working in this mode.
This option will help you to define the information basis by selecting corresponding cells.
DIOM3
Do This
We have decided to consider all unlagged and the first lagged samples of X2 to X6.
Try to select the corresponding cells in the table.
For a picture on correct selection, click Huh? below.
DIOM4
Do This
Choose the menu item Create Input-Output-
Model... from the Modeling menu.
DIOM5
Do This
The same dialog window was opened as earlier creating the static model.
Try to setup the dialog window.
If you need instructions on how setting up the dialog for modeling, have a look at SIOM4-SIOM6 again.
DIOM6
Now you can start the modeling process by clicking the Modeling button in the dialog window.
Input-Output Models - Predictions
Intro
You have now created your first input-output model. Input-output models can be identified in the Models menu by their name whithout any suffix. Like time series models, input-output models are presented graphically and analytically by their model equation and the modeling process has stoped itself, too.
In contrast to time series models, the output variable of input-output models is described by different input variables. This means, that to predict the output variable the data of the input variables must be available for the forecast horizon. These data can be obtained by time series models, other modeling techniques or they are assumptions or true values. Since the prediction results depend from these input data, this kind of prediction is called What-If prediction.
IOMPred1
Do This
In our example, we have the true data of the forecast horizon available which we want to use for model performance validation. These validation data need to be stored in the same column and below the design data. Therefore, you have to locate the corresponding first date (row) in the table.
Scroll the 'Data:' window down until rows 40-55 are visible.
IOMPred2
Do This
Click now in row 48 of column X1 (our output variable). This cell contains the first true date of the forecast horizon.
For a picture, click Huh? below.
IOMPred3
Do This
Choose the menu item Original Data Begin in This Row in the Table menu.
IOMPred4
Do This
Clear all red colored cells of the column X1 beginning at row 41 to row 45.
If you need instructions on clearing existing data, click Huh? below.
IOMPred5
Do This
To make a what-if prediction for existing input data choose the menu item What-If Prediction... in the Modeling menu.
IOMPred6
Do This
Type a value for the forecast horizon you want to use. Here, please type a 5.
IOMPred7
Do This
We have true data of the forecast horizon available which we want to use for model performance validation.
To feature this, click in this checkbox. Note, that if you check this item and there are no data or only less than the forecast horizon available, it will not affect the prediction itself.
IOMPred8
Do This
Check out this checkbox.
In this way, the predicted data will be placed in the table automatically since we have cleared the corresponding cells before. Existing data, however, will not be overwritten.
IOMPred9
Do This
Click on the Prediction button to predict the variable. Predicted values are displayed in red color.
System of Equations - a Network of Input-Output Models
Intro
In this section you will learn how you can create and predict systems of equations as a more sophisticated way to rise up prediction accuracy or, in some cases, to get a prediction at all in a reasonably time. We have seen in the previous section that to make predictions on input-output models, forecast data for all input variables are needed. This is practically not important for static models. For dynamic models, however, this is an essential question to be able to predict time processes. Therefore, and because the data you may working on could be very short and noisy, systems of equations will be one way we recommend to solve real world problems such as analysis, prediction and classification of rather complex processes.
SYM1
We want to create a predictable linear system of equations. We may have decided to consider the variables X1(t) ... X6(t) and their first and second lagged samples Xm(t-1), Xm(t-2) (m=1,2, ... , 6) to define the information basis. For systems of equations the selected output variable indicates only this variable which will be modeled first, followed by all other selected variables.
SYM2
Do This
Choose menu item Selection Mask On in the Table menu. If this menu item is not highlighted and replaced by the name Selection Mask Off, you are already working in this mode.
This option will help you to define the information basis by selecting corresponding cells.
SYM3
Do This
Define the information basis in the known way. Remember, that we have decided to consider the variables X1 to X6 and their first and second lagged samples.
For a picture or instructions on defining the information basis, click Huh? below.
SYM4
Do This
Choose the menu item Create Input-Output-
Model... from the Modeling menu.
SYM5
Do This
Click in this checkbox to create a system of equations consisting of all selected variables. This system is applicable for stepwise short-term to long-term prediction of all output variables.
Since we want to create a linear system turn the 'exclusively linear' radio button on.
SYM6
Do This
Setup the dialog and click the "Modeling" button to start the modeling process.
This process will take more or less time depending on the machine you are working on. Optionally, by clicking the Cancel button, you can skip this step and use the prepared system of equations further.
System of Equations - Predictions
Intro
You have now build your first system of equations. System models can be identified in the Models menu by a '-S' , added to the name of the model.
Using this system you are able to predict all variables simultaniously.
SYSPRED1
Do This
Make sure that the active model is a system model. The active model is checked by a checkmark in the Models menu.
SYSPRED2
Do This
Before you can use a system for prediction you have to build a best system out of all possible systems.
Choose the menu item Best System Of Equations in the Modeling menu.
SYSPRED3
You can see that that the system consists of two parts: identified and not identified variables of the system.
Identified variables are those the corresponding model appears to be able to reflect significant relationships. Therefore, they are considered to be a state variable of the system (endogenous variable).
Models of not identified variables, in contrast, appear to be not an internal part of the system either due to missing true state variables (not complete information basis) or due to true detection. Not identified variables are considered as exogenous variables and their prediction values are displayed in the table in orange color.
SYSPRED4
Do This
Clear all cells of column X1 to column X6 beginning at row 41 to row 45. For status-quo predictions all related cells need to be empty.
SYSPRED5
Do This
To make a long-term status quo prediction for all variables of the system choose the menu item Predict System... in the Modeling menu.
SYSPRED6
Do This
Type a value for the forecast horizon you want to predict the system.
Here, please type a 5.
SYSPRED7
Do This
Click on the Prediction button to predict the system.
SYSPRED8
Do This
Now you can choose the menu item What-If-Prediction... in the Modeling menu to see the graph for the currently active model.
The Model Base
Intro
Do This
To view the contents of the model base, click in the Models menu.
The model base stores all created models. For each column Xn of the table a time-series model (suffix '-AR'), an input-output model (no suffix) and a system model (suffix '-S') can be stored simultaneously. A document can contain one system of equations. Each new created model will be added to the model base or, in cases a corresponding model already exists, will replace this model automatically.
Only one model can be active at a time. This model is shown by a checkmark. All model-related features like viewing the model equation or the model graph or using a model for prediction are focused on the just active model. You can select a model by choosing its corresponding menu item.
MODELS1
Do This
Make the 'Graph:' window to the active window.
MODELS2
Do This
Choose the system model of the Filtered COD variable to make it the active model.
Please note, that in the upper right the name of the active model is shown.
MODELS3
Do This
Alternatively, you can browse through the model base foreward or backward.
To select the next model, click the right arrow button in the upper right.
MODELS4
Do This
Additionally and as far as possible, not only one best model but up to 3 best models will be created and stored in the model base after finishing each modeling process. This set of best models is stored separately in the model base for each model. It can be accessed through the submenu in the Models menu.
Select the second best time-series model of the COD conc. variable.
MODELS5
Do This
Again, you can alternatively browse through the set of best models by clicking the corresponding buttons (up and down arrow buttons in the upper right).
Note, that the number of the chosen best model is shown besides the name of the model.
To select the next best model, click the up arrow button.
Note, that the number of the chosen best model is shown besides the name of the model.
FINISH
Congratulations! You have finished the tutorial successfully.